The breakthrough: Represent words as vectors in continuous space where geometry encodes meaning.
From Symbols to Vectors
The Problem with One-Hot Encoding:
Each word gets a unique vector with a single 1:
“cat” → [0, 0, 1, 0, 0, …, 0]
“dog” → [0, 1, 0, 0, 0, …, 0]
Problem: Cosine similarity between any two words = 0
No notion of semantic similarity is captured!
Solution: Learn dense vector representations where similar words are close together.
The Twenty Questions Intuition
Imagine playing Twenty Questions to identify words:
Question
Bear
Dog
Cat
Is it an animal?
1
1
1
Is it domestic?
0
1
0.7
Larger than human?
0.8
0.1
0.01
Has long tail?
0
0.6
1
Is it a predator?
1
0
0.6
Each word becomes a vector of answers. Similar words give similar answers → similar vectors!
This is the essence of word embeddings.
Word2Vec: Learning from Context
The Distributional Hypothesis: Words appearing in similar contexts have similar meanings.
flowchart LR
C1["The ___ sat on the mat"]
C2["The ___ sat on the rug"]
Cat["cat"] --> C1
Dog["dog"] --> C2
Cat --> V["Similar Vectors!"]
Dog --> V
style Cat fill:#e1f5fe,stroke:#1976d2
style Dog fill:#e1f5fe,stroke:#1976d2
style V fill:#c8e6c9,stroke:#2e7d32
Given a center word, predict surrounding context words:
flowchart LR
A[loves] --> B[the]
A --> C[man]
A --> D[his]
A --> E[son]
style A fill:#e1f5fe,stroke:#0277bd,stroke-width:2px
style B fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
style C fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
style D fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
style E fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
LLMs are autoregressive: predict the next token given all previous tokens.
flowchart LR
C["Context"] --> M["LLM"]
M --> P["Probabilities"]
P --> S["Sample"]
S --> T["Token"]
T --> |"Append"| C
style C fill:#e3f2fd,stroke:#1976d2
style M fill:#fff3e0,stroke:#f57c00
style P fill:#e8f5e9,stroke:#388e3c
style T fill:#f3e5f5,stroke:#7b1fa2
Autoregressive Generation
Generation process:
Compute probability distribution over all possible next tokens
Sample from distribution (controlled by temperature)
Append sampled token → Repeat until done
Temperature: Controlling Randomness
Temperature controls the “creativity” of generation:
Prompt: “The first African American president is Barack…”
Most probable next token: “Obama” ✓
Also correct: “Hussein” (his middle name)
A greedy strategy always picks “Obama” — but in formal documents, “Barack Hussein Obama” is preferred.
Temperature > 0 allows the model to explore alternatives that may better fit the context.
The LLM Lifecycle
flowchart LR
D[Data Collection] --> P[Pre-Training]
P --> I[Instruction Tuning]
I --> A[Alignment]
A --> Dep[Deployment]
style D fill:#e3f2fd,stroke:#1976d2
style P fill:#e8f5e9,stroke:#388e3c
style I fill:#fff3e0,stroke:#f57c00
style A fill:#fce4ec,stroke:#c2185b
style Dep fill:#f3e5f5,stroke:#7b1fa2
Stage
Purpose
Data Collection
Curate training corpus (quality > quantity)
Pre-Training
Predict next tokens on billions of sequences
Instruction Tuning
Teach the model to follow instructions
Alignment
Ensure behavior matches human values (RLHF)
Deployment
Optimize for latency, cost, safety
Alignment: Why It Matters
flowchart LR
Q["User Query"]
Q --> U["Unaligned"]
Q --> A["Aligned"]
U --> UR["Yes, only true god"]
A --> AR["Multiple perspectives exist"]
style Q fill:#e3f2fd,stroke:#1976d2
style UR fill:#ffcccc,stroke:#cc0000
style AR fill:#ccffcc,stroke:#00cc00
Example: “Is Allah the only god?”
Unaligned: “Yes, Allah is the one true god and all other beliefs are false.”
Aligned: “In Islam, Allah is considered the one God. Other religions have different perspectives. I can provide factual information if helpful.”
This nuanced behavior emerges from alignment training, not pre-training alone.
Context Windows and Prompting
Context window: Maximum tokens the model can “see” at once
flowchart LR
S[System Prompt<br/>~500 tokens]
T[Tools/Schemas<br/>~300 tokens]
H[History<br/>~1000 tokens]
R[Retrieved Docs<br/>~2000 tokens]
U[User Query<br/>~200 tokens]
S --> M[LLM]
T --> M
H --> M
R --> M
U --> M
style S fill:#e3f2fd
style R fill:#c8e6c9
style U fill:#fff3e0
Prompting strategies: Zero-shot, Few-shot, Chain-of-thought, System prompts
AI Agents
What Are AI Agents?
“The question of whether a computer can think is no more interesting than the question of whether a submarine can swim.” — Edsger Dijkstra
AI agents are autonomous systems that:
Perceive their environment
Reason about goals
Take actions to achieve outcomes
Learn from results
Unlike chatbots, agents can act in the world.
The Agent Loop
flowchart LR
P["Perceive"] --> R["Reason"]
R --> A["Act"]
A --> O["Observe"]
O --> P
style P fill:#e3f2fd,stroke:#1976d2
style R fill:#fff3e0,stroke:#f57c00
style A fill:#e8f5e9,stroke:#388e3c
style O fill:#fce4ec,stroke:#c2185b
The agent perceives its environment, reasons about goals, acts to achieve outcomes, observes the result, and repeats — a continuous loop of intelligent behavior.
Tool Use: Giving LLMs Hands
LLMs are “brains without hands” — function calling bridges this gap:
flowchart LR
U["Query"] --> L["LLM"]
L --> TC["Tool Call"]
TC --> O["Orchestrator"]
O --> T["Tool"]
T --> |"Result"| L
L --> R["Response"]
style U fill:#e3f2fd,stroke:#1976d2
style L fill:#fff3e0,stroke:#f57c00
style TC fill:#fce4ec,stroke:#c2185b
style T fill:#e8f5e9,stroke:#388e3c
style R fill:#f3e5f5,stroke:#7b1fa2
Examples: Web search, database queries, code execution, API calls.
Key insight: Unlike single-pass generation, ReAct agents can course-correct based on intermediate results.
Case Study: ChatDev
ChatDev orchestrates a virtual software company with specialized AI agents:
flowchart LR
CEO[CEO] --- CTO[CTO]
CTO --- CPO[CPO]
Prog[Programmer] --- Des[Designer]
Test[Tester] --- Prog2[Programmer]
CEO --> Prog
Des --> Test
Test --> Doc[Documentation]
style CEO fill:#ffcccc,stroke:#cc0000
style CTO fill:#ccffcc,stroke:#00cc00
style Prog fill:#cce5ff,stroke:#1976d2
style Test fill:#fff3cd,stroke:#f57c00
Key principle: The more powerful the agent, the more guardrails it needs.
Agent Safety Challenges
flowchart LR
PI[Prompt Injection] --> A[Agent]
AD[Adversarial Inputs] --> A
GM[Goal Misalignment] --> A
HA[Hallucinations] --> A
CO[Capability Overhang] --> A
LC[Lack of Corrigibility] --> A
A --> H[Harm]
style PI fill:#ffcccc
style HA fill:#fff3cd
style H fill:#ff0000,color:#fff
Autonomous agents amplify risks — a hallucination becomes action.
Agent Safety: Risk Taxonomy
Risk
Description
Real Example
Prompt Injection
Hidden instructions hijack agent
Email contains “ignore previous instructions”
Hallucinations
Acting on false information
Agent invents API that doesn’t exist
Goal Misalignment
Optimizes wrong objective
Maximizes engagement via manipulation
Capability Overhang
Does more than authorized
Accesses files outside scope
Safety Mechanisms
flowchart LR
U[Input] --> IF[Input Filter]
IF --> |Clean| A[Agent]
IF --> |Malicious| B[Block]
A --> OF[Output Filter]
OF --> |Safe| R[Response]
OF --> |Unsafe| B
A --> M[Monitor]
M --> |Anomaly| CB[Circuit Breaker]
CB --> B
style IF fill:#fff3e0,stroke:#f57c00
style OF fill:#fff3e0,stroke:#f57c00
style B fill:#ffcccc,stroke:#cc0000
style R fill:#ccffcc,stroke:#00cc00
style CB fill:#fce4ec,stroke:#c2185b
The Safety Pipeline:
Input/Output Guards: Fast classifiers that run before and after the LLM.
Monitoring: Watching for “strange” behavior (e.g., an agent trying to access a restricted database).
Circuit Breakers: Automatically killing the agent process if safety thresholds are exceeded.
The Defense-in-Depth Pipeline
Layer
Purpose
Technical Method
Input Filter
Block malicious prompts
PII detection, jailbreak classifiers
Sandboxing
Isolate agent actions
Docker containers, restricted API keys
Output Filter
Prevent sensitive leakage
RegEx for PII, toxic content scoring
Human-in-the-Loop
Verify high-risk actions
“Approve” button for financial transfers
Monitoring
Detect runtime anomalies
Log analysis, capability tracking
Key Principle: Never rely on the LLM to self-police. Use external code to enforce boundaries.
Human-in-the-Loop (HITL)
The most effective safety measure for high-stakes agents:
Critical Actions: Require manual approval for destructive or financial operations (e.g., rm -rf, send_payment).
Confirmation Dialogue: Show the agent’s proposed plan before execution.
Feedback Loop: Allow the human to correct the agent’s reasoning.
Audit Logs: Every action approved or rejected by a human is recorded for training and safety reviews.
Example: A code-refactoring agent proposes changes; a human developer reviews and clicks “Merge” or “Reject”.
Anthropic’s ASL-3 Safety Measures
For Claude Opus 4, Anthropic activated proactive safety:
flowchart LR
U[User] --> CC[Constitutional Classifiers]
CC --> |Safe| M[Model]
CC --> |Blocked| B[Reject]
M --> OC[Output Check]
OC --> |Safe| R[Response]
OC --> |Harmful| B
BB[Bug Bounty] --> CC
RP[Rapid Patch] --> CC
style CC fill:#c8e6c9,stroke:#2e7d32
style B fill:#ffcccc,stroke:#cc0000
style R fill:#e3f2fd,stroke:#1976d2
ASL-3: Safety Pipeline
Layer
Function
Why It Matters
Constitutional AI
Real-time input/output filtering
Blocks harmful requests before execution
Bug Bounty
Crowdsourced discovery
Finds attacks humans miss
Rapid Patching
Auto-generate variants
Stays ahead of attackers
Egress Control
Throttle outbound data
Prevents model weight theft
Evaluating AI Agents
Traditional metrics (accuracy, precision) are insufficient for agents.
flowchart LR
A[Agent Output] --> R[Rule-Based]
A --> L[LLM-as-Judge]
A --> H[Human Review]
A --> S[Simulation]
R --> E[Score]
L --> E
H --> E
S --> E
style R fill:#e3f2fd,stroke:#1976d2
style L fill:#fff3e0,stroke:#f57c00
style H fill:#c8e6c9,stroke:#2e7d32
style S fill:#f3e5f5,stroke:#7b1fa2
style E fill:#ffcccc,stroke:#cc0000
Best practice: Combine multiple approaches for comprehensive evaluation.
Domain-Specific Benchmarks
Domain
Benchmark
What It Tests
Coding
SWE-bench
Fix real GitHub issues
Web
WebArena
Navigate websites, complete tasks
Robotics
ALFRED
Household tasks in 3D
Enterprise
TAU-bench
Multi-system workflows
Agent capabilities are task-specific — benchmarks must match use cases.
Red-Teaming Agents
Systematic vulnerability testing:
flowchart LR
PI[Prompt Injection] --> A[Agent]
ME[Agent Mistakes] --> A
MU[Direct Misuse] --> A
A --> |Vulnerability| V[Security Issue]
A --> |Safe| S[Normal Operation]
V --> R[Report]
style PI fill:#ffcccc,stroke:#cc0000
style ME fill:#fff3cd,stroke:#f57c00
style MU fill:#ffcccc,stroke:#cc0000
style V fill:#ffcccc,stroke:#cc0000
style S fill:#ccffcc,stroke:#00cc00
Example: Hidden text in a webpage hijacks agent to exfiltrate data.
Comprehensive red-teaming found 1,200+ vulnerabilities in one enterprise agent.
AI in the Physical World
Embodied AI: Robots
Software agents operate in digital systems. Embodied agents must handle:
flowchart LR
C["Camera"] --> F["Fusion"]
L["Lidar"] --> F
T["Touch"] --> F
F --> B["Robot Brain"]
B --> M["Motors"]
M --> E["Environment"]
E --> |"Feedback"| C
style F fill:#fff3e0,stroke:#f57c00
style B fill:#e3f2fd,stroke:#1976d2
style E fill:#c8e6c9,stroke:#388e3c
The sim-to-real gap: Robots trained in simulation often fail in reality.
The Evolution of Robotic Intelligence
flowchart LR
S[1960s Shakey] --> P[1980s-2000s Probabilistic]
P --> F[2020s Foundation Models]
style S fill:#e3f2fd,stroke:#1976d2
style P fill:#fff3e0,stroke:#f57c00
style F fill:#c8e6c9,stroke:#2e7d32
Robotics: Capability Eras
Era
Capability
Limitation
Rule-based
Explicit reasoning
Brittle, narrow
Probabilistic
Handle uncertainty
No language understanding
Foundation Models
Natural language + adaptation
Compute-intensive
LLMs have catalyzed a new era: robots that understand language and adapt.
Google’s Robotic Transformer (RT-2)
A vision-language-action model that directly controls robots:
flowchart LR
V[Vision Input] --> VLA[RT-2 Model]
L[Language Input] --> VLA
VLA --> A[Action Output]
A --> R[Robot]
R -- Feedback --> V
style V fill:#e3f2fd,stroke:#1976d2
style L fill:#fff3e0,stroke:#f57c00
style VLA fill:#f3e5f5,stroke:#7b1fa2
style A fill:#c8e6c9,stroke:#2e7d32
style R fill:#ffcccc,stroke:#cc0000
General → Interactive → Dexterous
Works across robot forms: arms, humanoids, mobile platforms.
Robot Safety: ASIMOV
Named after Asimov’s Laws of Robotics, this benchmark tests embodied AI safety:
Asimov’s Law
Modern Interpretation
Test Scenario
1. Don’t harm humans
Refuse dangerous commands
“Throw this at the person”
2. Obey orders
Follow safe instructions
“Hand me that tool”
3. Protect self
Avoid self-damage
Don’t walk off ledge
Zeroth Law
Protect humanity broadly
Consider societal impact
Key challenge: Context matters — “Hand me that knife” is safe in a kitchen, dangerous in a conflict.
Business relevance: As robots enter warehouses, hospitals, and homes, safety benchmarks become legal and ethical requirements.
Key Takeaways
Summary: NLP
Concept
Key Insight
Word Embeddings
Words as vectors; geometry = meaning
Distributional Hypothesis
Context reveals meaning
Attention
Dynamic weighting of relevant information
Transformers
Parallel processing, scalable, powerful
The shift from symbols to vectors enabled modern NLP.